MG205: Econometrics Theory and Applications

Topic 8: Exploiting Time Variation

José Ignacio González Rojas

London School of Economics and Political Science

February 9, 2026

Cross-Sectional Data Cannot Separate Heterogeneity from Treatment Effects

Panel Data Gives Us New Tools to Address Endogeneity

The problem

  • Endogeneity: \(\text{Cov}(x_{it}, e_{it}) \neq 0\)
  • Violation of Assumption 5 \(\Rightarrow\) no identification
  • OLS gives biased estimates of the parameters of interest
  • Cross-sectional data alone cannot fix this

Today

  • Assume a particular error structure: \(e_{it} = \alpha_i + u_{it}\)
  • With panel data, construct estimators invariant to \(\alpha_i\)

Following the same units over time enables new identification and estimation strategies.

Two Estimators Remove Unit-Level Unobserved Heterogeneity

First Differences and LSDV

First Differences (FD)

  • Population model: \(y_{it} = \beta x_{it} + \alpha_i + u_{it}\)
  • Subtract consecutive observations:

\[\Delta y_{it} = \beta \Delta x_{it} + \Delta u_{it}\]

  • \(\alpha_i - \alpha_i = 0\): unobserved heterogeneity disappears

Least Squares Dummy Variables (LSDV)

  • Include a dummy for each unit \(i\):

\[y_{it} = \beta x_{it} + \sum_{j=2}^{N} \gamma_j \mathbb{1}[i=j] + u_{it}\]

  • The dummies absorb \(\alpha_i\)
  • Equivalent to FD for \(T=2\) (we prove this later)

Exercise 1: Unobserved Heterogeneity Biases Cross-Sectional Estimates

Airline Fares Depend on Unobserved Route and Time Characteristics

Two Sources of Omitted Variable Bias

\[ \begin{aligned} \log(\text{fare})_{it} &= \beta_0 + \beta_1\log(\text{distance})_i + \beta_2\text{competition}_{it} + e_{it} \\ e_{it} &= \gamma_i + \delta_t + u_{it} \end{aligned} \]

  • \(\gamma_i\): route-specific, time-invariant unobserved heterogeneity
    • Business relationships
    • Airport amenities
  • \(\delta_t\): common time shocks
    • Fuel prices
    • Economic conditions
  • \(u_{it}\) is idiosyncratic error
  • We worry that \(\text{Cov}(\text{competition}_{it}, \gamma_i) \neq 0\)
    • Model not identified
    • OLS estimates are linear projections
    • \(\hat{\beta}\) might be biased estimates of the structural parameters \(\beta\)

Controlling for Distance and Year Dummies Does Not Remove Route Heterogeneity

The Proposed Model Falls Short

Estimated model

\[\begin{align*} \widehat{\log(\text{fare})}_{it} &= \hat{\beta}_{0} + \hat{\beta}_{1}\log(\text{distance})_{i} \\ &+ \hat{\beta}_{2}\text{competition}_{it} \\ &+ \hat{\delta}_{1}\mathbb{1}[t=2007] \\ &+ \hat{\delta}_{2}\mathbb{1}[t=2012] \end{align*}\]

What remains in the error?

  • Recall: \(e_{it} = \gamma_i + \delta_t + u_{it}\)
  • The year dummies address common time trends (\(\delta_t\))
  • \(\gamma_i\) remains in the error

Since \(\text{Cov}(\text{competition}_{it}, \gamma_i) \neq 0\), OLS is biased.

First-Differencing Eliminates Route-Level Unobserved Heterogeneity

The First-Difference Estimator

  • First-difference operator: \(\Delta x_{it} = x_{it} - x_{it-1}\)
  • Example: Take the USA–UK route. Subtract 2002 from 2007, and 2007 from 2012.

\[ \Delta\log(\text{fare})_{it} = \beta_2\Delta\text{competition}_{it} + \Delta\delta_t + \Delta u_{it} \]

\(\gamma_i - \gamma_i = 0\): time-invariant route characteristics disappear.

With year dummies for transition periods (2002–2007 base, 2007–2012):

\[ \widehat{\Delta\log(\text{fare})}_{it} = \hat\alpha + \hat\beta_2\Delta\text{competition}_{it} + \hat\delta\mathbb{1}[\text{transition } 2007-2012] \]

Combining FD with Year Dummies Addresses Both Sources

Two Strategies for Two-Way Heterogeneity

FD removes \(\gamma_i\) (unit FE)

  • Subtract consecutive observations
  • \(\gamma_i - \gamma_i = 0\)
  • Time-invariant variables also drop out: \(\Delta\log(\text{distance})_i = 0\)

Year dummies absorb \(\delta_t\) (time FE)

  • Include dummies for transition periods
  • Common time shocks captured
  • This is LSDV applied to time effects

(1) FD + year dummies, or (2) full LSDV with dummies for both units and time periods.

Rejecting the Null Does Not Validate the Model

The Trap

  • With robust \(t\)-statistics, we reject \(H_{0}: \beta_{\text{competition}} = 0\)
  • But the null assumes the model is correctly specified
  • If OVB remains (route-level heterogeneity not addressed), \(\hat{\beta}\) is biased
  • Statistical significance \(\neq\) valid causal interpretation
  • The estimate is a linear projection
  • FD reduces bias from time-invariant confounders but does not eliminate all sources

Exercise 2: Time Effects Capture Industry-Wide Patent Growth

Industry-Wide Patent Growth Requires Flexible Time Effects

Setting

  • 37 pharmaceutical firms
  • 2005-2007
  • No OVB concerns
    • Causal interpretation
  • Patents growing industry-wide
    • Regardless of individual firm R&D

Model

\[\begin{align*} \log(\text{patents})_{it} &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ &+ \beta_2\mathbb{1}[t=2006] + \beta_3\mathbb{1}[t=2007] \\ &+ e_{it} \end{align*}\]

  • Could model the trend linearly or quadratically
  • Year dummies allow any form — nonparametric
  • \(\beta_{1}\): elasticity of patents w.r.t. R&D (causal)

Year Dummy Coefficients Measure Growth Rates

Conditional Expectations Are The Tool to Interpret

\[\begin{align*} \mathbb{E}[\log(\text{patents})_{it} \mid t=2005] &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2006] &= (\beta_0 + \beta_2) + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2007] &= (\beta_0 + \beta_3) + \beta_1\log(\text{R\&D})_{it} \end{align*}\]

  • Average log change in patents across all firms
  • Geometric mean growth rate of patents in the industry, holding R&D constant
    • \(\beta_2\): 2005 to 2006
    • \(\beta_3\): 2005 to 2007

Year dummies measure growth rates between periods — not “the level in 2006 vs 2005.”

Interactions Allow the R&D Elasticity to Vary Over Time

Heterogeneous Elasticities

\[ \begin{align*} \log(\text{patents})_{it} &= \beta_0 + \beta_1\log(\text{R\&D})_{it} + \beta_2\mathbb{1}[t=2006] + \beta_3\mathbb{1}[t=2007] \\ &+ \beta_4(\log(\text{R\&D})_{it} \times \mathbb{1}[t=2006]) + \beta_5(\log(\text{R\&D})_{it} \times \mathbb{1}[t=2007]) \\ &+ e_{it} \end{align*} \]

Conditional means by year

\[\begin{align*} \mathbb{E}[\log(\text{patents})_{it} \mid t=2005] &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2006] &= (\beta_0 + \beta_2) + (\beta_1 + \beta_4)\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2007] &= (\beta_0 + \beta_3) + (\beta_1 + \beta_5)\log(\text{R\&D})_{it} \end{align*}\]

Level Shifts and Slope Shifts Are Separately Identified

Decomposing Differential Effects

Year Intercept Elasticity
2005 \(\beta_0\) \(\beta_1\)
2006 \(\beta_0 + \beta_2\) \(\beta_1 + \beta_4\)
2007 \(\beta_0 + \beta_3\) \(\beta_1 + \beta_5\)

How the patents-R&D elasticity changes over time

  • \(\beta_4\): elasticity change 2005 \(\to\) 2006
  • \(\beta_5\): elasticity change 2005 \(\to\) 2007

No need to interpret \(\beta_4\) and \(\beta_5\) individually; the conditional means do the work.

Exercise 3: Empirical Models Interact All Relevant Variables

Gender Wage Gaps Changed After the Mining Boom

Three Patterns from the Data

  • Gender: Men earn a constant wage premium over women
  • Time trend: Wages grow over time for both groups
  • Structural break (2005): After the mining company arrives, the male premium widens

The Empirical Model Interacts Gender, Time, and Post-2005

Eight Coefficients for Four Groups

\[ \begin{align*} \log(\text{wages})_{it} &= \beta_0 + \beta_1\mathbb{1}[i\text{ is male}] + \beta_2 t + \beta_3\mathbb{1}[t \geq 2005] \\ &+ \beta_4(\mathbb{1}[i\text{ is male}] \times t) + \beta_5(\mathbb{1}[i\text{ is male}] \times \mathbb{1}[t \geq 2005]) \\ &+ \beta_6(t \times \mathbb{1}[t \geq 2005]) + \beta_7(t \times \mathbb{1}[t \geq 2005] \times \mathbb{1}[i\text{ is male}]) \\ &+ e_{it} \end{align*} \]

This model captures level differences, trends, and how both changed after 2005, separately for men and women.

Conditional Means: Pre-2005

Women and Men Before the Structural Break

Women before 2005 (base category)

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=0,t<2005] = \beta_0 + \beta_2 t \]

Men before 2005

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=1,t<2005] = (\beta_0 + \beta_1) + (\beta_2 + \beta_4)t \]

\(\beta_1\) shifts the intercept; \(\beta_4\) shifts the slope.

Conditional Means: Post-2005

Women and Men After the Structural Break

Women after 2005

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=0,\; t \geq 2005] = (\beta_0 + \beta_3) + (\beta_2 + \beta_6)t\]

Men after 2005

\[\begin{align*} \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=1,\; t \geq 2005] &= (\beta_0 + \beta_1 + \beta_3 + \beta_5) \\ &\quad + (\beta_2 + \beta_4 + \beta_6 + \beta_7)t \end{align*}\]

Each coefficient modifies either the intercept or slope for a specific group-period combination.

Taking Differences Isolates Each Coefficient’s Role

Condition on Group, Then Difference

Coefficient Signs Follow Directly from the Figure

Economic Interpretation

Positive (\(> 0\))

  • \(\beta_1\): male premium
  • \(\beta_2\): wages grow over time
  • \(\beta_7\): male wages grow faster post-2005

Zero (\(= 0\))

  • \(\beta_3\): no level break for women at 2005
  • \(\beta_4\): same pre-2005 growth rate
  • \(\beta_6\): female growth unchanged post-2005

Negative (\(< 0\))

  • \(\beta_5\): relative to the base group (women pre-2005), the intercept for men post-2005 is lower than what other coefficients predict

Exercise 4: Panel Data Enables Identification and Estimation

Panel Data Follows the Same Units Over Time

Definition and Structure

  • \(y_{it}\), \(x_{it}\) for \(i = 1, \ldots, N\) and \(t = 1, \ldots, T\)
  • Panel data: same units observed across multiple time periods
  • Cross-section: one \(t\) only
  • Repeated cross-section: same population, different individuals each period

Error decomposition

\[ e_{it} = \alpha_i + v_{it} \]

Panel Structure Enables Identification of Parameters

Identification vs Estimation

Identification

  • Could we recover unique values for each parameter?
    • Cross-section: \(\alpha_i\) in error, correlated with \(x_{it}\) \(\to\) cannot identify \(\beta\)
    • Panel: difference out \(\alpha_i\)
  • Requires: \(\mathbb{E}[v_{it} \mid x_{it}] = 0\)

Estimation

  • Given identification, how do we compute \(\hat\beta\) from the data?

\[ \Delta y_i = \delta + \beta_1\Delta x_{i1} + \cdots + \beta_k\Delta x_{ik} + \Delta v_i \]

  • Requires \(T \geq 2\) and within-unit variation

Example: cannot estimate returns to education via FD if education does not change over time.

Panel Data Reduces OVB but Cannot Eliminate All Sources of Bias

Solves

  • Time-invariant OVB (\(\alpha_i\))
  • Unit-invariant OVB (\(\lambda_t\))

Does not solve

  • Time-varying confounders (\(u_{it}\))
  • Measurement error
  • Selection bias

Less scope for OVB, but not zero.

Exercise 5: First-Differencing Amplifies Measurement Error

Education Is Measured with Error in Both Periods

True vs Observed Variables

\[ \text{education}_{it} = \text{education}^{*}_{it} + e_{it} \]

Assumptions

  • \(e_{it}\) uncorrelated with true education and other variables
  • Education varies little over time for adults

Cross-Sectional Attenuation Bias Shrinks the Coefficient Towards Zero

The Baseline Problem

Population model

\[ \log(\text{wage})_i = \alpha + \beta\text{education}^{*}_i + \epsilon_i \]

Observed model substitutes \(\text{education}_i = \text{education}^{*}_i + e_i\)

Attenuation bias

\[ \text{plim}\;\hat\beta = \beta \cdot \frac{\text{Var}(\text{educ}^*)}{\text{Var}(\text{educ}^*) + \text{Var}(e)} \]

The ratio is less than 1, so the coefficient is biased towards zero (derived in Topic 6).

First-Differencing Increases the Measurement Error Variance

The Panel Data Paradox

FD of observed education

\[ \Delta\text{education}_i = \Delta\text{education}^{*}_{i} + (e_{i2} - e_{i1}) \]

If \(e_{i1}\) and \(e_{i2}\) are uncorrelated

\[ \text{Var}(e_{i2} - e_{i1}) = \text{Var}(e_{i1}) + \text{Var}(e_{i2}) \]

Panel Data Involves a Fundamental Bias Trade-off

FD Eliminates Fixed Confounders but Amplifies Measurement Error

\[ \text{plim}\;\hat\beta_{\text{FD}} = \beta \cdot \frac{\text{Var}(\Delta\text{educ}^*)}{\text{Var}(\Delta\text{educ}^*) + \text{Var}(e_{i1}) + \text{Var}(e_{i2})} \]

Benefits

  • Eliminates \(\alpha_i\), reduces OVB from time-invariant confounders
  • Enables causal identification under strict exogeneity

Costs

  • Measurement error variance grows in denominator
  • Numerator small if \(x\) changes little over time

Attenuation ratio is smaller — more severe bias towards zero.

Exercise 6: Panel Data Cannot Solve Selection Bias

Roommate Nationality May Affect Student Grades

Self-Selection into Rooms Creates Endogeneity

Let \(\text{same}_{it} = \mathbb{1}[i\text{ has same-nationality roommate in } t]\)

\[ \text{grades}_{it} = \alpha + \beta\;\text{same}_{it} + e_{it} \]

  • Exogeneity holds under random assignment of roommates
  • If students choose roommates: an omitted equation determines room selection
  • More outgoing students may prefer different nationalities and perform differently academically
  • \(\text{Cov}(\text{same}_{it}, e_{it}) \neq 0\) arises from selection, not unobserved heterogeneity

Two Years of Data Require Roommate Changes

Panel Structure and Exogenous Mobility

Year 1

\(\text{grades}_{i1} = \alpha + \beta\;\text{same}_{i1} + \alpha_i + u_{i1}\)

Year 2

\(\text{grades}_{i2} = \alpha + \beta\;\text{same}_{i2} + \alpha_i + u_{i2}\)

First-differencing

\(\Delta\text{grades}_i = \beta\;\Delta\text{same}_i + \Delta u_i\)

  • Critical: students must change roommates (\(\Delta\text{same}_i \neq 0\) for some)
  • Exogenous mobility design — plausible if the university reassigns rooms

The Problem Is Selection, Not Unobserved Heterogeneity

What Panel Data Cannot Fix

  • FD removes \(\alpha_i\) (unobserved ability) — but the core problem is selection into rooms
  • If reasons for changing roommates correlate with grade changes, FD does not help
  • Panel data addresses unobserved heterogeneity
  • It does not address selection bias

Random Assignment Plus Panel Data Strengthens Identification

You Get What You Pay For

  • Random assignment of roommates:
    • \(\beta\) is identified even in cross-section
    • Panel adds precision: removes \(\alpha_i\) from the error \(\to\) smaller variance \(\to\) smaller standard errors
  • Potential selection: panel data alone cannot solve the selection problem

Exercise 7: Clustered Standard Errors Account for Within-Unit Correlation

Serial Correlation Affects Inference

Clustering Is a Conservative Fix

The three pillars

  • Identification: Can we recover \(\beta\)?
  • Estimation: How do we compute \(\beta\)?
    • Neither is affected by serial correlation
  • Inference: What can we learn about \(\beta\)?
    • Serial correlation violates i.i.d. sampling
    • Invalid SEs, p-values, and CIs

Why errors are correlated

  • Within-unit persistence
    • Worker earnings persist year to year
  • Within-group spillovers
    • Departmental training affects all workers
  • Cluster SEs at the unit level
    • Allows arbitrary within-unit correlation

Clustering does not change \(\hat{\beta}\) — only the standard errors and inference.

Exercise 8: Time Fixed Effects Can Be Collinear with Treatment

Measuring the Effect of Increased Force

All Municipalities Treated Simultaneously

  • Panel of municipalities
    • All increase police on the same date
    • Vertical line: date of police increase
  • Drug usage appears to decline after treatment
    • But drug usage was already trending down before treatment

Day Fixed Effects Cannot Separate Treatment from Time

The Collinearity Trap

Ideal model: \(\text{drug usage}_{it} = \mu\;\text{post}_t + \theta_i + \rho_t + e_{it}\)

Conditional expectations — with \(T = 4\), treatment at \(t = 3\):

\[\begin{align*} \mathbb{E}[\text{drug usage}_{it} \mid t=1] &= \theta_i + \rho_1 \\ \mathbb{E}[\text{drug usage}_{it} \mid t=2] &= \theta_i + \rho_2 \\ \mathbb{E}[\text{drug usage}_{it} \mid t=3] &= \mu + \theta_i + \rho_3 \\ \mathbb{E}[\text{drug usage}_{it} \mid t=4] &= \mu + \theta_i + \rho_4 \end{align*}\]

  • \((\mu + \rho_3)\) observed jointly, not separately \(\Rightarrow\) \(\mu\) not identified
  • Reason: \(\text{post}_t\) is a linear combination of day dummies

A Linear Time Trend Restores Identification

Parametric but Estimable

\[\text{drug usage}_{it} = \mu\text{post}_t + \theta_i + \gamma t + e_{it}\]

Interpretation of \(\gamma\)

  • Average change per time unit
  • Imposes linearity — may miss curvature
  • Intermediate: polynomial

The trade-off

Parametric trend Day FE
Flexibility Low (linear) High (any shape)
Estimate \(\mu\)? Yes No (collinear)
Risk Misspecified trend No identification

When treatment varies only at the time level, time FE absorb it completely.

Exercise 9: Age Dummies Provide Nonparametric Functional Forms

When We Do Not Know the Functional Form, Use Dummies

Nonparametric Estimation

\[ \log(\text{earnings})_{it} = \alpha_i + \theta_t + \sum_{j=17}^{85} \gamma_j\;\mathbb{1}[\text{age}_{it} = j] + e_{it} \]

  • \(\alpha_i\): individual FE (absorbs ability, education, etc.)
  • \(\theta_t\): time FE (absorbs aggregate macroeconomic trends)
  • \(\gamma_j\): average difference in log-earnings between workers aged \(j\) and workers aged 16, holding constant \(\alpha_i\) and \(\theta_t\)
    • Nonparametric: no assumption on the shape of the age-earnings relation
    • Parametric: quadratic requires \(f(\text{age}) = \beta_1\;\text{age} + \beta_2\;\text{age}^2\)

Why log?

Imprecision at Extremes Reflects Thin Data

The Variance Formula for Dummy Variables

Each age dummy \(d_j = \mathbb{1}[\text{age}_{it} = j]\) is a binary variable with proportion \(p_j = n_j/n\):

\[ \text{Var}(\hat{\gamma}_j) \propto \frac{\sigma^2}{n \cdot p_j(1 - p_j)} \]

  • Most workers aged 25-64 \(\Rightarrow\) \(p_j\) near its maximum \(\Rightarrow\) \(\text{Var}(\hat{\gamma}_j)\) small
  • Few workers at ages 16-24 and 65-85 \(\Rightarrow\) \(p_j \approx 0\) \(\Rightarrow\) \(\text{Var}(\hat{\gamma}_j)\) large
  • \(\hat{\gamma}_j\) at extreme ages has wide confidence intervals

Nonparametric flexibility comes at the cost of imprecision where data is thin.

Exercise 10: Fixed Effects Decompose Treatment Into Incentive and Selection

Performance Pay Increases Firm-Level Productivity by 20%

But How Much Is Incentives vs Selection?

A firm introduces performance pay. The panel is unbalanced: some workers leave (exiters), some stay (stayers), some join (entrants).

\[ \log(\widehat{\text{productivity}})_{it} = \hat{\alpha}_i + \hat{\beta}\;\text{performance pay}_t \]

OLS (Pooled)

  • \(\hat{\beta}_{\text{OLS}} = 0.20\) (SE \(= 0.03\))
  • Captures total effect at firm level
    • Uses all workers (stayers + exiters + entrants)

Fixed Effects

  • \(\hat{\beta}_{\text{FE}} = 0.10\) (SE \(= 0.02\))
  • Captures within-worker incentive effect only
    • Only stayers contribute to identification

Half the Effect Is Incentives, Half Is Composition

The Decomposition

Incentive effect

  • \(\hat{\beta}_{\text{FE}} = 0.10\)
  • Same workers produce more under performance pay
    • They work harder when paid for output

Composition effect

  • \(\hat{\beta}_{\text{OLS}} - \hat{\beta}_{\text{FE}} = 0.20 - 0.10 = 0.10\)
  • Different workers join the firm under performance pay
    • \(\mathbb{E}[\alpha_i \mid \text{entrant}] > \mathbb{E}[\alpha_i \mid \text{exiter}]\)

OLS captures total change; FE isolates the within-unit mechanism. The difference is the selection channel.

The Decomposition Requires No Correlated Shocks

OLS Estimates a Different Quantity, Not a Biased One

  • Assumes no other changes coincided with performance pay introduction
    • e.g., new management, different product mix, macroeconomic shifts
  • \(\hat{\beta}_{\text{FE}}\) is causal if: \(\mathbb{E}[u_{it} \mid \text{performance pay}_t, \alpha_i] = 0\)
    • Strict exogeneity conditional on individual FE
  • The unbalanced panel is central, not incidental
    • Composition effect detected precisely because workers change

Causal interpretation requires that no time-varying confounders coincide with the policy change.

Exercise 11: Worker Composition Changes Require Fixed Effects

Time-Varying Controls Address Observable Confounders

But Worker Quality Is Unobserved

\[ \text{productivity}_{it} = \beta_1\;\text{contingent}_t + \beta_2\;\text{weather}_t + \beta_3\;\text{width}_{it} + \beta_4\;\text{height}_{it} + e_{it} \]

  • Every worker is assigned to different zones each day, so we can control for plant-level conditions (weather, plant size).
  • Worker ability differs across periods
    • Best workers leave for blueberry harvest in August (second half)
    • Composition effect is negative: remaining workers are less productive
    • \(\text{contingent}_t\) is collinear with day dummies

Adding Worker Fixed Effects Controls for Unobserved Ability

Controls Handle Observables; FE Handles Unobservables

\[\begin{align*} \text{productivity}_{it} &= \beta_1\;\text{contingent}_t + \beta_2\;\text{weather}_t + \beta_3\;\text{width}_{it} + \beta_4\;\text{height}_{it} \\ &+ \gamma_i + e_{it} \end{align*}\]

  • \(\gamma_i\): worker fixed effect — absorbs time-invariant ability
    • Same worker’s productivity compared across wage regimes
  • Controls address observable time-varying confounders (weather, plants)
  • FE addresses unobservable worker-level confounders (ability, motivation)
  • However, only workers observed in both periods contribute to identification
    • If many workers churn, effective sample shrinks

Observable controls and fixed effects address different sources of bias.

Exercise 12: Multiple Fixed Effects Identify Leader Quality Through Rotation

Call Centre Productivity Varies Across Operators, Leaders, and Days

Three Sources of Variation

Setting

  • Outcome: calls answered per hour
  • Panel of operators \(\times\) days
  • Random call assignment: no operator-call selection
  • Operators and leaders rotate across teams
    • Rotation provides identifying variation

Three sources of heterogeneity

  • \(\lambda_i\): operator ability
  • \(\mu_j\): leader quality (the parameter of interest)
  • \(\theta_t\): day-level conditions (holidays, system outages)

Three-Way Fixed Effects Isolate Leader Quality

Rotation Is the Identification Condition

\[ \text{productivity}_{ijt} = \lambda_i + \mu_j + \theta_t + e_{ijt} \]

  • \(\lambda_i\): operator FE — absorbs intrinsic worker ability
  • \(\mu_j\): leader FE — the object of interest
  • \(\theta_t\): day FE — absorbs common daily shocks
  • Identification requires rotation: same operator observed under different leaders
    • Without rotation, \(\lambda_i\) and \(\mu_j\) are not separately identified

The F-Test Provides Evidence for Leader Quality Differences

Joint Significance of Leader Fixed Effects

Restricted vs unrestricted

  • Restricted (\(H_0\) true): \(\text{productivity}_{ijt} = \lambda_i + \theta_t + e_{ijt}\)
  • Unrestricted (\(H_1\)): \(\text{productivity}_{ijt} = \lambda_i + \mu_j + \theta_t + e_{ijt}\)

Five-step framework

  1. Choose \(\alpha = 0.05\)
  2. \(H_0: \mu_2 = \mu_3 = \cdots = \mu_J = 0\)
  3. \(F = \frac{(\text{RSS}_R - \text{RSS}_U)/q}{\text{RSS}_U/(n - K)}\;,\quad q = J - 1\)
  4. Reject \(H_0\) if \(F > F_{q,\; n-K,\; \alpha}\)
  5. If reject: leaders differ in quality

Multiple fixed effects require sufficient rotation across dimensions for identification.

Panel Data Provides Identification Through Within-Unit Variation

Summary (I)

  1. FD and LSDV address different components of unobserved heterogeneity
  2. Time effects capture common growth; interactions capture heterogeneous effects
  3. Measurement error is amplified by first-differencing — a fundamental trade-off
  4. Panel data cannot solve selection bias
  5. Clustered standard errors correct for within-unit serial correlation

Panel Data Provides Identification Through Within-Unit Variation

Summary (II)

  1. Time fixed effects can be collinear with treatment
  2. Dummies provide nonparametric functional forms at the cost of imprecision at extremes
  3. Fixed effects decompose total effects into within-unit incentive and composition channels
  4. Multiple fixed effects require rotation for identification; F-tests assess joint significance

Next Week: Topic 8 Part III

Difference-in-Differences and Natural Experiments

  • Difference-in-differences estimator and parallel trends (Q13, Q17)
  • Repeated cross-sections vs panel data for DiD (Q13, Q16)
  • Police discrimination: interactions as causal tests (Q14)
  • Natural experiments: cannabis policy and student grades (Q15)
  • Continuous treatment intensity with panel data (Q18)

Appendix: Detailed Derivations

First-Difference Derivation for General Panel Model

Write the model for \(t = 1\) and \(t = 2\):

\[ \begin{align*} y_{i1} &= \beta_0 + \beta_1 x_{i1,1} + \cdots + \beta_k x_{i1,k} + a_i + v_{i1} \\ y_{i2} &= (\beta_0 + \delta) + \beta_1 x_{i2,1} + \cdots + \beta_k x_{i2,k} + a_i + v_{i2} \end{align*} \]

Subtract: \(a_i - a_i = 0\).

\[ \Delta y_i = \delta + \beta_1\Delta x_{i1} + \beta_2\Delta x_{i2} + \cdots + \beta_k\Delta x_{ik} + \Delta v_i \]

The time-invariant component \(a_i\) has been eliminated. OLS on this equation yields consistent estimates under \(\mathbb{E}[\Delta v_i \mid \Delta x_{i1}, \ldots, \Delta x_{ik}] = 0\).

Return to main

Attenuation Factor: Cross-Section vs First-Differences

Cross-section

\[ \text{plim}\hat\beta_{\text{CS}} = \beta \cdot \frac{\text{Var}(\text{educ}^*)}{\text{Var}(\text{educ}^*) + \text{Var}(e)} \]

First-differences (assuming uncorrelated measurement errors)

\[ \text{plim}\;\hat\beta_{\text{FD}} = \beta \cdot \frac{\text{Var}(\Delta\text{educ}^*)}{\text{Var}(\Delta\text{educ}^*) + \text{Var}(e_{i1}) + \text{Var}(e_{i2})} \]

  • Since education changes little over time:
    • \(\text{Var}(\Delta\text{educ}^*) \ll \text{Var}(\text{educ}^*)\).
    • FD \(<\) CS attenuation ratio, and the bias towards zero is worse in FD

Return to main

Why Log?

Three Reasons for Using Log-Earnings

  1. Skewness: raw earnings are right-skewed
    • \(\log(\text{earnings})\) yields a more symmetric distribution, closer to normality (supporting AS7)
  2. Multiplicative relationships: if earnings = base \(\times\) skill premium \(\times\) experience premium, then
    • \(\log(\text{earnings}) = \log(\text{base}) + \log(\text{skill premium}) + \log(\text{experience premium})\)
    • Log linearises multiplicative structures into additive ones
  3. Growth rates: \(\Delta\log(\text{earnings}) \approx \%\Delta\text{earnings}\)
    • Differences in logs approximate percentage changes
    • The natural unit for comparing workers across different baseline earnings

Return to Q9

Age-Period-Cohort Identification Problem

Why Age, Time, and Individual FE Cannot All Be Included

\[ \log(\text{earnings})_{it} = \alpha_i + \theta_t + \sum_{j=17}^{85} \gamma_j\;\mathbb{1}[\text{age}_{it} = j] + e_{it} \]

suffers from a fundamental identification problem:

\[ \text{age}_{it} = \text{year}_t - \text{birth year}_i \]

  • \(\text{birth year}_i\) is a linear function of \(\alpha_i\) (time-invariant)
  • \(\text{year}_t\) is captured by \(\theta_t\)
  • Therefore age is perfectly collinear with \(\alpha_i\) and \(\theta_t\)

Return to Q9

OLS vs FE Decomposition — Formal

Stayers (\(S\)), Exiters (\(X\)), Entrants (\(N\))

\[\begin{align*} \hat{\beta}_{\text{OLS}} &= \bar{y}_{\text{post}} - \bar{y}_{\text{pre}} \\ &= \left[\frac{|S|}{|S|+|N|}\bar{y}^S_{\text{post}} + \frac{|N|}{|S|+|N|}\bar{y}^N_{\text{post}}\right] \\ &\quad - \left[\frac{|S|}{|S|+|X|}\bar{y}^S_{\text{pre}} + \frac{|X|}{|S|+|X|}\bar{y}^X_{\text{pre}}\right] \end{align*}\]

  • \(\hat{\beta}_{\text{OLS}}\): weighted across all workers (stayers + exiters + entrants)
  • \(\hat{\beta}_{\text{FE}} = \frac{1}{|S|}\sum_{i \in S}(y_{i,\text{post}} - y_{i,\text{pre}})\): averages within stayers only; \(\hat{\beta}_{\text{OLS}} - \hat{\beta}_{\text{FE}}\) captures composition

Return to Q10

AKM: Matched Employer-Employee Framework

Abowd, Kramarz, and Margolis (1999, Econometrica)

The call centre model (Q12) is a simplified version of AKM’s framework. In AKM:

\[ \log(\text{wages})_{it} = \alpha_i + \psi_{J(i,t)} + x'_{it}\beta + e_{it} \]

  • \(\alpha_i\): worker fixed effect (ability, human capital)
  • \(\psi_{J(i,t)}\): firm fixed effect for the firm \(J\) employing worker \(i\) at time \(t\)
  • Identification requires worker mobility across firms (exogenous mobility assumption)
    • Same logic as Q12: operators rotate across teams
    • Without mobility, \(\alpha_i\) and \(\psi_{J(i,t)}\) are not separately identified

Return to Q12

Perfect Collinearity: Day FE and Post

Formal Proof That \(\mu\) Is Not Identified

With day dummies \(d_1, \ldots, d_T\) and treatment at day \(k\):

\[ \text{post}_t = \sum_{s=k}^{T} \mathbb{1}[t = s] = d_k + d_{k+1} + \cdots + d_T \]

  • \(\text{post}_t\) is an exact linear combination of day dummies
  • With \(d_1\) as reference: \(T-1\) dummies plus \(\text{post}_t\) give \(T\) variables with \(T-1\) independent columns
  • The design matrix \(X'X\) is rank-deficient by 1 \(\Rightarrow\) \(\mu\) not identified
  • Resolution: replace day dummies with a parametric trend (\(\gamma\;\text{time}_t\))

Return to Q8